Skip to content

fix: optimize SortedStreamSetObjectsList with binary search (#3137)#3145

Open
knightabir wants to merge 1 commit intoAutoMQ:mainfrom
knightabir:fix/sorted-stream-set-objects-binary-search
Open

fix: optimize SortedStreamSetObjectsList with binary search (#3137)#3145
knightabir wants to merge 1 commit intoAutoMQ:mainfrom
knightabir:fix/sorted-stream-set-objects-binary-search

Conversation

@knightabir
Copy link

Description

This PR optimizes SortedStreamSetObjectsList by replacing LinkedList with ArrayList and implementing binary search for insertion, removal, and contains operations.

Related Issue

Fixes #3137

Problem

The current implementation uses LinkedList with linear O(n) scans for insertion and removal operations, leading to poor performance as metadata size grows. The code contained explicit TODO comments requesting binary search optimization.

Solution

  • Replaced LinkedList with ArrayList for O(1) random access
  • Implemented binary search using Collections.binarySearch()
  • Added helper methods findInsertionIndex() and findExactMatch()
  • Properly handle duplicate orderId cases (since compareTo uses orderId while equals uses objectId)

Changes Made

File Modified: metadata/src/main/java/org/apache/kafka/metadata/stream/SortedStreamSetObjectsList.java

Key Changes:

  1. Data Structure: LinkedListArrayList
  2. Add Operation: Implemented binary search for insertion
  3. Remove Operation: Optimized with binary search (implemented TODO)
  4. Contains Operation: Optimized with binary search
  5. Added Helper Methods:
    • findInsertionIndex(): Uses Collections.binarySearch() to find insertion point
    • findExactMatch(): Handles duplicate orderId values correctly

Performance Impact

Operation Before After Improvement
Search O(n) O(log n) Logarithmic
Insert O(n) scan O(log n) search + O(n) shift Better for large datasets
Remove O(n) scan O(log n) search + O(n) shift Better for large datasets
Contains O(n) O(log n) Logarithmic

Special Handling

The implementation correctly handles the case where:

  • compareTo() uses orderId for sorting
  • equals() uses objectId for identity

When multiple objects have the same orderId, the binary search locates one of them, then scans forward and backward to find the exact match by objectId.

Testing

✅ All existing tests pass (SortedStreamSetObjectsListTest.testSorted())
✅ No changes to test files required - behavior is preserved
✅ Verified with: ./gradlew test --tests SortedStreamSetObjectsListTest

Checklist

  • Code compiles without errors
  • All existing tests pass
  • Behavior-preserving change
  • Implements existing TODO comments
  • Added null safety checks
  • Added proper JavaDoc documentation
  • Follows project coding standards
  • No breaking changes

Additional Notes

This is a behavior-preserving optimization that directly implements the TODO comments present in the original code:

// TODO: optimize by binary search  (in add method)
// TODO: optimize by binary search  (in remove method)

The implementation uses Java's standard Collections.binarySearch() for reliability and maintainability.

)

Problem:
- SortedStreamSetObjectsList used LinkedList with linear O(n) scans for
  insertion and removal operations
- Poor performance as metadata size grows
- Explicit TODO comments requesting binary search optimization

Solution:
- Replace LinkedList with ArrayList for efficient O(1) random access
- Implement binary search using Collections.binarySearch()
- Handle duplicate orderId cases correctly (compareTo uses orderId while
  equals uses objectId)

Changes:
- Switch from LinkedList to ArrayList
- Implement findInsertionIndex() using Collections.binarySearch()
- Implement findExactMatch() for remove/contains with binary search
- Add proper null safety and type checking
- Handle duplicates by scanning nearby elements after binary search

Performance Impact:
- Search operations: O(n) -> O(log n)
- Insertion/removal: O(n) scan -> O(log n) search + O(n) array shift
- Overall improvement for large datasets

Testing:
- All existing tests pass
- Behavior-preserving change that implements existing TODOs

Fixes AutoMQ#3137
@CLAassistant
Copy link

CLAassistant commented Jan 12, 2026

CLA assistant check
All committers have signed the CLA.

@superhx
Copy link
Collaborator

superhx commented Jan 16, 2026

@knightabir Hey! Thanks a ton for your PR! It's really awesome to see your contribution. By the way, would you be super amazing and add a test for it?

@knightabir
Copy link
Author

@knightabir Hey! Thanks a ton for your PR! It's really awesome to see your contribution. By the way, would you be super amazing and add a test for it?

Yes sure I will be adding the test ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement existing TODO: binary search optimization in SortedStreamSetObjectsList

3 participants